Czech-to-slovak adapted broadcast news transcription system
نویسندگان
چکیده
The first broadcast news (BN) transcription system for Slovak is introduced. It employs the same modules as the system we developed earlier for Czech. We utilize similarity between the two languages in efficient lexicon building, in mapping Slovak specific (rarely occurring) phonemes onto Czech ones and in low-resource cross-lingual adaptation of acoustic model. The system uses 166K-word lexicon and on the Slovak part of European COST278 BN database achieves 23.6% WER (which is only 5% less than the original, longterm optimized Czech system). Similar results were achieved also on recently recorded data from four Slovak stations.
منابع مشابه
TUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation
This article presents an overview of the existing acoustical corpuses suitable for broadcast news automatic transcription task in the Slovak language. The TUKE-BNews-SK database created in our department was built to support the application development for automatic broadcast news processing and spontaneous speech recognition of the Slovak language. The audio corpus is composed of 479 Slovak TV...
متن کاملFully automated system for Czech spoken broadcast transcription with very large (300k+) lexicon
We present a system developed for fully automated processing of Czech spoken broadcast programs. It includes modules for unsupervised segmentation of audio stream, speaker and gender recognition followed by speaker adaptation, and own speech decoder designed for extremely large vocabularies. Compared to our previous results reported in 2004, the new system reduced the WER (evaluated on the Czec...
متن کاملOnline Temporal Language Model Adaptation for a Thai Broadcast News Transcription System
This paper investigates the effectiveness of online temporal language model adaptation when applied to a Thai broadcast news transcription task. Our adaptation scheme works as follow: first an initial language model is trained with broadcast news transcription available during the development period. Then the language model is adapted over time with more recent broadcast news transcription and ...
متن کاملVery large vocabulary speech recognition system for automatic transcription of czech broadcast programs
This paper describes the first speech recognition system capable of transcribing a wide range of spoken broadcast programs in Czech language with the OOV rate being below 3 per cent. To achieve that level we had to a) create an optimized 200k word vocabulary with multiple text and pronunciation forms, b) extract an appropriate language model from a 300M word text corpus and c) develop an own de...
متن کاملCross-Lingual Adaptation of Broadcast Transcription System to Polish Language Using Public Data Sources
We present methods and procedures designed for cost-efficient adaptation of an existing speech recognition system to Polish. The system (originally built for Czech language) is adapted using common texts and speech recordings accessible from Polish web-pages. The most critical part, an acoustic model (AM) for Polish, is built in several steps, which include: a) an initial bootstrapping phase th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008